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ABSTRACT 



An object oriented dispatch optimization method determines 
statically which body of code will be executed when a 
method is dispatched. The program code is examined to 
identify all procedure bodies that can be invoked for a given 
class and a given method. An identified procedure body is 
analyzed to determine whether a method invocation on a 
pointer can invoke only one procedure body. Based on this 
analysis, either the procedure body or the invocation mecha- 
nism is changed so a unique procedure is directly called 
without a test or dispatch being used. 
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OBJECT ORIENTED DISPATCH 
OPTIMIZATION 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention generally relates to dispatch opti- 
mization in an object oriented environment and, more 
particularly, to a method for determining statically which 
body of code will be executed when a method is dispatched. 

2. Background Description 

Object oriented programming (OOP) is the preferred 
environment for building user-friendly, intelligent computer 
software. Key elements of OOP are data encapsulation, 
inheritance of attributes and polymorphism (i.e., overload- 
ing of operator names). While these three key elements are 
common to OOP languages, most OOP languages imple- 
ment the three key elements differently. 

In a conventional programming language, such as C or 
Pascal, procedures and functions are written to manipulate 
data and obtain solutions. In contrast, object oriented pro- 
gramming allows the programmer to view concepts as a 
variety of units or objects in a hierarchy, without worrying 
about the data type, repeated variable names, or function 
names. This allows the programmer to concentrate on the 
program design, rather than programming rules. The pro- 
grammer can represent relationships among components, 
objects, tasks to be performed, and conditions to be met in 
a way that allows the reuse of code components and reduces 
the bulkiness of code and the time and effort needed to 
develop programs. 

Examples of OOP languages are Smalltalk, Object Pascal, 
Java, and C++, all of which are well known in the art of 
computer languages, compilers, and applications program- 
ming. Of these, Smalltalk may be characterized as a pro- 
gramming environment instead of merely a language. Small- 
talk was developed in the Learning Research Group at 
Xerox's Palo Alto Research Center (PARC) in the early 
1970s. In Smalltalk, a message is sent to an object to 
evaluate the object itself. Messages perform a task similar to 
that of function calls in conventional programming lan- 
guages. The programmer does not need to be concerned with 
the type of data. Instead, the programmer need only be 
concerned with using the right message. Object Pascal, 
another OOP, was created by developers from Apple 
Computer, some of whom were involved in the development 
of Smalltalk at PARC, in conjunction with Niklaus Wirth, 
the designer of Pascal. C++ was developed by Bjarne 
Stroustrup at the AT&T Bell Laboratories in 1983 as an 
extension of C. C++ modules are compatible with C mod- 
ules and can be linked freely so that existing C libraries may 
be used with C++ programs. 

The key concept of all OOP is the class, which is a 
user-defined type. Classes provide object oriented program- 
ming features. Further information on object oriented pro- 
gramming concepts may be obtained by referring to one of 
several standard text books, such as Object Oriented Design 
with Applications by Grady Booch, The Benjamin/ 
Cummings Publishing Co., Inc. (1991). 

To focus this description on the instant invention, and 
minimize unnecessary discussion of background informa- 
tion known to persons skilled in the art, descriptive terms 
will be used where appropriate which, except where noted or 
where clear from the context, are familiar terms known to 
ones of ordinary skill in OOP. More particularly, as used 
herein, a class is a definition of a type of object, describing 
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its methods and the kinds of data that can be placed in the 
object. During execution of the program various objects are 
created. When an object is created based on a class, that is 
called an instance of that class and the names of the 
individual pieces of data are called instance variables. 

The procedures that can be invoked on the data in an 
instance of a class are the methods of that class. The class 
definition therefore contains descriptions of both the 
instance variables and the methods. 

Further, a class such as, for example, foo, can be defined 
in terms of another class such as, for example, bar, by 
declaring that foo is like bar except that it may contain 
additional instance variables, additional methods, or that 
some of the methods have a different definition. In this way 
the definition of foo can be much shorter than bar's. In this 
case, foo is said to inherit from bar. Foo is said to be a 
derived class of bar. Moreover, bar is said to be a base class 
of foo. If baz is a derived class of foo, then baz is also a base 
class of bar, which is to say the derived class (and base class) 
relations are transitive. 

A class and its derived class may contain different defi- 
nitions for an identically named method. For example, foo 
and bar may both have a method called, for example, 
addsub, and both have an instance variable named, for 
example, I, but when addsub is invoked on an instance of 
foo, I is increased by one, and when it is invoked on an 
instance of bar, I is decreased by one. The method body, also 
known as function body, for addsub when the instance is of 
type foo is the method body which increases I, and the 
method body for addsub when the instance is of type bar is 
that which decreases I by one. 

It is, in general, not possible to determine (either by 
compilers or by humans) when looking at a program which 
invokes a method on an instance what class that instance will 
be. The reason is that, as the program proceeds, the same 
code can be applied to instances of many different classes. 
Therefore, at the time the program is executing the code 
must determine the class of an instance and invoke the 
appropriate function body. This process of determining the 
class of an instance and invoking the appropriate function 
body is called method dispatch. Method dispatch consists of 
using both information about the class of the instance and 
the name of the method to determine which code should be 
executed next. 

In Smalltalk, all method dispatches are made dynamically. 
In other words, the actual method body that will be invoked 
at a particular method invocation site is determined by a 
combination of the class of the object and the name of the 
method. This may be done by following a pointer from the 
object to its class, and then looking up the method in the 
classes* method table. In contrast, C++ allows the program- 
mer to decide whether a particular method should be dis- 
patched statically or dynamically. Methods that are dis- 
patched dynamically are called "virtual methods." The 
present invention is concerned only with virtual method 
invocations and, therefore, methods that are dispatched 
statically will not be described. Accordingly, when the 
following description references C++ programs, the terms 
"method dispatch" or "method invocation," unless otherwise 
qualified, shall mean "virtual method dispatch" or "virtual 
method invocation." Note that in the C++ literature, a virtual 
method dispatch is also referred to as a "virtual function 
call." 

Method dispatches are a major source of complexity when 
trying to optimize the programs. More particularly, there is 
a direct cost associated with method dispatches in the form 
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of the extra instructions required for the dynamic dispatch, examining a method dispatch; and 

including extra memory operations, and pipeline penalties determining whether that method is one of the methods 

caused by branching to an unknown address. For C++ previously determined to have only one function and, if 

programs, these costs have been estimated as ranging from so, converting the dispatch to a direct call. 
0% of the total run time, for programs that make no 5 There is at least one problem with the Resolution by 

significant use of method dispatch, to 6%, for programs that Unique Name Method, though, which is that the replace- 

make moderate use of method dispatch, to 27%, for pro- ment of a dispatch by a direct call cannot be done by a 

grams that make extensive use of method dispatch. traditional compiler. The major reason is that a compiler is 

There is also an indirect cost relating to inlining" which, usually not given an entire program at a time, but only a 
in many cases, may be even more significant. As is known 1Q piece of the program. The entire program is composed by a 

in the art, inlining is the process whereby a procedure call is program called a linker. Hence, the Resolution by Unique 

replaced with a copy of the body of the called procedure. Name Method is generally referred to as a link-time opti- 

Inlining is much more important in object oriented lan- mization. It is possible for Calder and Grunwald's method to 

guages than in non-object oriented languages, such as C or be performed at other times, if it is given appropriate 

Fortran, because object oriented languages encourage information. For example, if a database of information about 

smaller, more modularized functions, and often have a an entire program is built first, then when a compiler is used 

linguistic mechanism to support inlining. Studies have to compile a piece of a program, it can refer to information 

shown that C++ functions do in fact have significantly in the built database about the rest of the program. However, 

smaller static and dynamic instruction counts than C pro- tf the rest of the program changes, and the information 

grams. However, when a method dispatch is dynamic, which was reHed upon changes, the program may have to be 

inlining cannot be performed. recompiled using the new information. 

Another cost associated with method dispatch is compi- Another related method directed to resolving virtual func- 
lation speed and compiled code size. The compiled code size tion dispatch is called class hierarchy analysis (henceforth 
is increased, and hence the compilation speed is decreased "CHA"), and is described by Dean, Grove and Chambers in 
because, without any information about the potential targets "Optimization of Object-Oriented Programs Using Static 
of a method dispatch, all the possible methods must be class Hierarchy Analysis", in the Proceedings of the Ninth 
linked into the program. This slows down linkage and causes European Conference on Object-Oriented Programming, 
unused methods to bloat the object file. Springer- Verlag, 1995. CHA uses the type information sup- 
There are methods in the relevant art directed to reducing, plied only in statically typed languages, therefore, is not 
at least partially, the above-identified problems relating to 3Q applicable to a dynamically-typed OOP languages such as 
inlining and compilation speed and compiled code size, but Smalltalk. 

each has limitations in performance, or requires so large of The meaning of statically-typed, which relates to OOP 

a processing time as to be impracticable. languages such as C++ is as follows: When a virtual function 

One of these methods, which will be referred to as the call is made in C++, it is dispatched through a pointer or a 
Resolution by Unique Name method, is described by Brad 35 reference to an object, and that pointer has a particular type. 

Calder and Dirk Grunwald (hereinafter referenced as The type is either specified explicitly, i.e., statically, by the 

"Calder and Grunwald") in their article entitled "Reducing programmer, or is statically derivable by the compiler from 

Indirect Function Call Overhead in C++ Programs," Con- other type information, as shown in the example below: 
ference Record of the 21st ACM Symposium on Principles of 
Programming Languages, Portland, Oreg., January 1994, ^ 

pp. 397-408. In the referenced article, Calder and Grunwald " 

observed that in C ++ if there is only one method body ^tffifrS^Sfe 

defined for a given method signature in the entire program, A « ^p.}. 

then all dispatches for that method must be to that one class C: public B {public: virtual void giorpO; 

method body. The method signature or function signature in }> 

a typed language like C++, Java, or Modula-3 is its name void f^!, 6 ,?* p > { .. .„ . n . 

Jr , - . - , p -* bar(); // Call via explicit type B 

and the number and types of its arguments; for untyped p _ top bar Q ; fi call via implicit type "A" 

languages like Smalltalk it is just the name and the number } 

of arguments. 

It can be seen that if one can determine that for all 50 In either case> the compiler has statically obtainable 

dispatches from a particular point in the program a particular mforma tion about the declared type of the pointer through 

method body must be invoked, then that dispatch can be wnich the dispatch is being made. The rules of the language 

replaced by a direct call to the appropriate body of code (common to C ++, Java, and Modula-3) state that a pointer 

without modifying the end result of the program, and hence whose slatic type ^ « B » can lo of type « B » or 
this replacement can safely be done automatically. Calder 55 any of its derived type s (but not its base types), 

and Grunwald have published some preliminary measure- if A is a base class from which B derives, and C in 

ments stating that, in their selected set of particular bench- wrn derives from B, then a pointer "p" of static type "B" can 

mark programs, it is possible with the Resolution by Unique actually point to objects of types "B" or "C" at runtime, but 

Name Method to replace approximately one third of the not 0D j ec ts of type "A" 

method dispatches by a direct call. 60 CHA resolves a virtual function dispatch by computing 

Calder and Grunwald's Resolution by Unique Name mis set of possible dynamic types, and then determining the 

Method can be summarized as being an algorithm for virtual function that would be invoked for each of those 

determining when a method dispatch can go only to a single types. In the example above, the call "p->foo( )" is through 

method body, which consists of the steps of: a pointer of static type "B", whose possible dynamic types 
examining the entire program; 6 5 are "B" and "C". For dynamic type "B", the function 

determining which methods have only one method body invoked would be B::foo( ); for dynamic type *'C" the 

associated with that method signature; function invoked would also be B::foo( ), since "C" does not 
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override foo and therefore inherits its definition from "B". 
Since there is only one possible target function for all of the 
possible dynamic types that "p" can point to, the call can be 
resolved to "p->B::foo( ) ". 

In the second example, the pointer "p->top" is to an object 
of static type "A", so the possible dynamic types are "A", 
"B", and "C". If "p->top" points to an object of type "B" or 
"C", the virtual function B: :foo( ) will be invoked, as we just 
described. However, if "p->top" points to an object of type 
"A", then A::foo( ) will be invoked because "A" defines its 
own version of the "foo" function. Since there is more than 
one possible target function, the call cannot be resolved by 
CHA. 

It can be seen by one of ordinary skill that the set of virtual 
call sites resolved by CHA is a superset of the call sites 
resolved by Calder and Grunwald's Unique Name method. 
This is because CHA starts by using the signature of the 
function and then uses the static type information to identify 
additional virtual call sites. 

However, since CHA relies on knowing the set of classes 
derived from a particular static class type, it requires knowl- 
edge of the complete class hierarchy. Therefore, like the 
Resolution by Unique Name Method, CHA can either be 
performed at link-time or at an earlier phase, provided that 
a program database is available which specifies the complete 
class hierarchy. 

Still another method known in the art for removing dead 
code is referred to as "alias analysis." A description of alias 
analysis can be found in several publications, including K. 
Cooper et al., "Fast Interprocedural Alias Analysis", Con- 
ference Record of the Sixteenth ACM Symposium on Prin- 
ciples of Programing, ACMPRESS, January 1989, pp. 
49-59. Basically, alias analysis processes a program in a 
manner that keeps track of every variable identified during 
compilation, and keeps track of what each variable could 
possibly point to, and iterates repeatedly over each function 
call to determine, rigorously, if the call will occur during 
running of the program. Depending on which particular alias 
analysis algorithm is used, and which language it is applied 
to, and other program-specific parameters, this method often 
requires, in typical cases approximately fifty to one thousand 
iterative inspections of the entire program and of each 
function call. Accordingly, alias analysis can be impracti- 
cable for many applications. 

SUMMARY OF THE INVENTION 

It is an object of the present invention to provide a method 
for determining statically which body of code will be 
executed when a method is dispatched. 

It is another object of the invention to provide a very fast, 
simple type-inference process for object oriented programs. 

As will be seen from the description below, the present 
invention, which will be called the Rapid Type Analysis, (or 
"RTA"), converts many cases of method dispatches to direct 
calls which would not have been converted by the class 
hierarchy analysis (CHA). Further, the present invention can 
potentially reduce the set of possible method bodies that can 
be invoked at a method dispatch site. The RTA method 
accomplishes this function by inspecting each function call 
only once, instead of the repeated inspections required by 
alias analysis. 

More particularly, the present invention provides a pro- 
cess for increasing the execution speed of programs which 
comprises the steps of examining a program to identify all 
procedure bodies that can be invoked for a given class and 
a given method, analyzing a procedure body to determine 
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whether a method invocation on a pointer can invoke only 
one procedure body, and changing either the procedure body 
or the invocation mechanism so that a unique procedure is 
directly called without a test or dispatch being used. 

5 Further, the present invention provides a method to deter- 
mine that certain code in the object oriented program will 
never be executed. Since unexecuted code does not need to 
be compiled or linked into the final program, reducing both 
compilation time and the size of the compiled program. 

10 There are two basic detections used within the present 
invention to establish that code cannot be executed, which 
are summarized below. Although these detections are sum- 
marized in the language of flow analysis, in a conservative 
manner, the preferred implementation will be an optimistic 

15 one. For informational purposes, a complete technical defi- 
nition of "optimistic" can be found in, for example, Constant 
Propagation with Conditional Branches, by Mark Wegman 
and Frank Kenneth Zadeck, ACM Transactions on Program- 
ming Languages and Systems, volume 13 number 2, April 

20 1991. 

The first detection identifies methods for which there for 
no dispatches. This detection is exploitable because if there 
are no dispatches of a given method then that method 

25 contains only dead code and therefore can be eliminated. 
When all such code bodies have been eliminated, there may 
be remaining methods which were only invoked from code 
bodies now known to be dead code. Therefore, this first 
detection can be iterated. It should be noted that "iteration" 

3Q as used here distinguishes from "iteration" as used in alias 
analysis, the latter referring to repeated inspection of the 
same functions or methods. 

If using a conservative analysis the present method 
assumes at its start that all procedure bodies are dead, with 

35 the exception of those procedures specially defined by the 
language to be invoked automatically when the program is 
run. For example, for C++ these automatically invoked 
procedures would be the procedure named "main" and the 
constructor methods of any objects defined in the outer, 

4Q "global", scope. These specially designated procedures are 
considered to be live. It should be noted that a procedure is 
live if it cannot be determined, with certainty, that it will not 
be called. In other words, a procedure is live if, with some 
particular set of inputs, it may be executed when the 

45 program is run. 

Code directly invoked from live code is flagged as live. 
This process is iterated until there is no more live code 
discovered. All other code is now known to be dead and can 
be safely eliminated. As will be described, "optimistic" 

50 analysis can obtain a better result. This can be seen by 
considering a routine that invokes itself but is not invoked 
from anywhere else. The optimistic analysis will correctly 
determine that the code is, in fact, dead, whereas the more 
conservative analysis will not. 

55 The second detection identifies classes and derived 
classes for which there are no allocations of objects. It is 
then known that all methods defined for that class cannot be 
invoked. Subsequently it is known that certain code, i.e., the 
code relating to these methods, is dead. Next, since it is then 

60 known that the only allocations for certain classes are within 
that dead code, this second detection can be iterated. The 
present inventive method combines these two detection 
mechanisms for discovering dead code. As will be 
described, the present invention preferably employs tech- 

65 niques of optimistic analysis to obtain a stronger result. 
Further, in a preferred implementation, the program code is 
examined to identify all method bodies that can be invoked 
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for a given class and a given method. An identified proce- connected to the computer platform 104 including a terminal 
dure body is analyzed to determine whether a method 126, a data storage device 130, and a printing device 134. An 
invocation on a pointer can invoke only one method body. operating system 108 coordinates the operation of the van- 
Based on this analysis, either the method body or the ous components or the computer system 100. An example of 
invocation mechanism is changed so that a unique method is 5 computer system 100 as described above is the IBM RISC 
directly called without a test or dispatch being used. More System/6000 (RISC System/6000 is a trademark of the IBM 
specifically, rather than using the alias analysis method of Corporation.) It is readily understood that those skilled in the 
trying to identify the possible types at each virtual function computer arts will be familiar with many equivalent corn- 
call site, the methods implemented in this invention are puter systems 100. 

directed to identifying the set of possible types that can be 10 FIGS. 2a-2d is a flow diagram of the invention. It shows 

created by a program. This greatly simplifies the task. a procedure for calculating the set of potentially live pro- 

Because of the design and usage conventions of C++ classes, cedures P, and the set of potentially instantiate classes C. 

the process is almost as precise as more complex alias This information can subsequently be used for function call 

analysis methods that attempt to track the possible type of de-virtualization, code compaction, programming 

each individual variable. The result is a reduced processing 15 environments, etc. as described in the subsequent section, 

time, which can be as much as fifty to one thousand times, process depicted m ^GS. 2a~2d applies to OOP 

depending on the particular code. Further, the present languages in general, both statically typed languages like 

method is also easily adapted to allow separate compilation. c++ and Java> dynamically typed languages like Small- 

URTFP nF^PRTPTTOM OF THF OR AWTMH^l ,„ ^ StepS ^ m de P endent on the language semantics are 

BRIEF DESCRIPTION OF THE DRAWINGS 20 repr esented as double boxes. Single boxes represent steps 

The foregoing and other objects, aspects and advantages lhat a PPty uniformly to all OOP languages, 

will be better understood from the following detailed The process is described using the notation of set theory 

description of a preferred embodiment of the invention with in order to simplify and clarify the presentation. For 

reference to the drawings, in which: increased operating efficiency, the preferred embodiment 

FIG. 1 is a block diagram showing a computer system on uses Usts * Uees * and hash toblcs to represent these sets, 

which the object oriented dispatch optimization procedure Further, iteration of steps is presented in terms of creating a 

according to the invention may be implemented; and and then subsequently removing its elements. For effi- 

FIGS. 2a, lb, 2c, and 2d constitute a flow diagram cienc y m °P.f ration me P[ eferred embodiment wiU imple- 

illustrating the logic of the present method. 30 ™ VT* °™ r " g ^marking 

00 r of set elements. A generic operation of the process according 

DETAILED DESCRIPTION OF A PREFERRED t0 ^ P reseDt invention ^ now be described: 

EMBODIMENT OF THE INVENTION Tte process begins by initializing its data sets. More 

particularly, the set of live procedures P is initialized in a 

The invention applies to all OOP languages that provide ^ language -dependent manner (300). For C++, P is initialized 

dynamic method dispatch, including C++, Smalltalk, etc. to main( ) and the constructor methods of all global scope 

The invention assumes that the class created by a NEW objects. The set of unexamined live procedures U is set equal 

operation is known. In C++, this is always so because the to P (301). 

new operation can only be performed with a type name, The set of live classes C (those that may be instantiated by 
which is static. In languages such as Smalltalk, the NEW ^ the program under analysis) is initialized in a language- 
operation can take a type variable, so the class or classes dependent manner to be those classes which must of neces- 
created must be determined by analysis. To apply this sity be created (or are already created) when the program is 
invention to Smalltalk, it must be supplemented with a started (302). For C++, this is the set of classes that are 
procedure for determining the set of classes that can be created as objects in the global scope, 
created by each NEW operation. 45 The set of virtual function invocations I is also initialized 
Referring now to the drawings, and more particularly to to be empty (303). The set I records, in a language- 
FIG. 1, there is shown a block diagram showing a computer dependent manner, the kinds of virtual function calls that 
system 100 on which a preferred embodiment of the present have been made by procedures that have been determined to 
invention operates. The preferred embodiment includes one be live by the process. Each member of the set I is a pair of 
or more application programs 102. One type of application 50 <function-name, type-info rmation>. For Smalltalk, the type 
program 102 is a compiler 105 which includes an optimizer information is simply the arity (number of arguments) of the 
106. The compiler 105 and optimizer 106 are configured to function called. For C++ and Java, the type information is 
transform a source (like an application program 102) pro- the declared static type of the pointer or reference through 
gram into optimized executable code. More generally, the which the virtual call was made, and the types of the 
source program is transformed to an optimized form and 55 arguments of the function. 

then into executable code. A more detailed description of The process then commences its outer iteration over the 

basic concepts of compilers is found in Alfred V. Aho, Ravi unexamined procedures in U (304). If there are no unexam- 

Sethi and Jeffrey D. Ullman, Compilers: Principles, ined procedures remaining, the process terminates. 

Techniques, and Tools, published by Addison- Wesley Pub- Otherwise, a procedure p is removed from U (305). 

lishing Co. (1986). 60 The set Cp of classes created by procedure p is computed 

The compiler 105 and optimizer 106 operate on a com- in a language dependent manner (306). For statically typed 

puter platform 104 that includes a hardware unit 112. The languages, this information is easily available through a 

hardware unit 112 includes one or more central processing simple static analysis of the procedure body, since when a 

units (CPU) 116, a random access memory (RAM) 114, and new class object is created the class type must be specified 

an input/output interface 118. Micro-instruction code 110, 65 directly. For dynamically typed languages like Smalltalk, 

for instance a reduced instruction set, may also be included calculating Cp is more complex because the argument to the 

on the platform 104. Various peripheral components may be NEW operation may be a type variable, not just a type 
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constant. For Smalltalk, Cp can either be the set of classes described by I. For C++ and Java, it is the set of methods of 

referred to in the procedure body, or else a pre-analysis will the classes C that have the same name, arity, and argument 

have to take place which calculates the set of possible types types as I, and whose class is either the same as the static 

that could be created. class type of i, or is a derived class of the static class type 

Once the set of created classes Cp has been computed, the 5 of I. 

process continues in FIG. 2(b) at B, and iterates over the The set T is then computed to be the set T without those 

elements of Cp (310). If Cp is empty, the process continues methods already in P, the set of live procedures (326). The 

at C, in FIG. 3(c). If Cp is not empty, a class c is removed methods in T are added to P (327) and U, the set of 

from Cp (311). unexamined live procedures (328). 

If c is already in C, the set of live classes, then it has 10 As stated above, when all of the virtual method invoca- 

already been instantiated by another procedure that was tions Ip by procedure p have been considered, control passes 

previously examined, and the algorithm goes back for the to D, shown in FIG. 2(d). The set N is computed to be the 

next class (312). Otherwise, c is a newly discovered live non- virtual functions called by p (330). In a pure OOP 

class and is added to the set C of live classes (313). Mc is language like Smalltalk, this set is always empty, and the 

computed to be the set of virtual methods of class c; for 15 steps in FIG. 2(d) can be omitted. 

Smalltalk and other pure OOP languages, all methods are The set N' is computed to be the set N without those 

virtual so Mc will be the set of all methods of c (314). procedures already in P, the set of live procedures (331). The 

The process then iterates over the methods in Mc (315). procedures in N' are then added to P (332) and U (333). 

When all methods have been examined, control returns to ^ Finally, control returns to A in FIG. 2(a). The processing 

310 and the next class created by the current procedure p is 0 f procedure p is complete, and a new unexamined live 

considered. If there are methods remaining in Mc, the next procedure from the set U is processed, or else the process 

method m is removed (316). If m has already been discov- terminates, 
ered to be a live procedure (in the set P), then the next 

method is considered (317). PARTICULAR IMPLEMENTATION EXAMPLES 

Otherwise, a language-dependent test is performed as to The invention has been described in as generic a manner 

whether the method m is compatible with any of the live as possible, to enable one of skill in the computer arts to 

virtual call invocations in I (318). For Smalltalk, this test readily tailor it and implement it on various particular 

simply involves determining whether there is an element languages. In furtherance of that objective the invention has 

<fiinction-name,arity> contained in I, where function-name 3Q been described using the mathematical notation of set 

is the name of m and arity is the number of its arguments. theory. A practitioner skilled in the arts will recognize that 

For C++ and Java, the test is more complex: is there a these sets can be efficiently implemented using lists, hash 

member of I with the same name and argument types as m, tables, and similar standard data structures, 

whose statically declared type is either c (the class type of In an example implementation, the invention is imple- 

method m) or a derived class of c? If this test fails, then there 35 mented within a compiler for the C++ language. The com- 

is no method invocation in the live procedures P which is piler builds a call graph for the program, and the sets P and 

compatible with m, so control returns to 314 and the next U are represented implicitly by boolean variables attached to 

method of c is examined. If the test succeeds, then m is a each procedure object. Similarly, there is a graph which 

new live method, and is added to both P and U (319); control represents the class hierarchy, and the set C of live classes 

then returns to 315 and the next method of c is examined. 4Q is represented implicitly by a boolean value attached to each 

When all the classes Cp instantiated by procedure p have class object. The set I of call instances is represented by a 

been examined (310) control passes to C in FIG. 2(c). This hash table which is indexed by function signature, 

is the second phase of the algorithm, which handles the The set Cp of classes instantiated by a procedure p is 

virtual call sites in procedure p. The first phase, shown in represented by a list associated with the procedure object. 

FIG. 2(b), handled the new classes created by procedure p. 45 This list is pre-computed in an earlier phase of compilation. 

The set Ip is computed, in a language-dependent manner, Instead of removing elements from the set Cp, our imple- 

to be the virtual functions invocations of the procedure p mentation simply traverses this linked list. 

(320). The invocations are represented in the same way as in Similarly, the set Mc of virtual methods of class c is 

the set I, namely as a pair <function-name, type- pre-computed by the compiler as part of its representation of 

informations where type-information is language- 50 the class hierarchy graph. The set Mc is represented by a 

dependent. hash table, and a standard iterator method is used to examine 

The process then iterates over the virtual function invo- all of the members of set Mc. 
cations Ip (321). When there are no more invocations to The set Ip of virtual calls by procedure p is also pre- 
process, control passes to the final phase of the process, computed by an earlier compiler phase, and the iteration 
labeled D on FIG. 2(d), which is described further below. 55 over this set is implemented by traversal of the linked list. 
While there are invocations remainmg to process, an invo- ^ ^ T of methods of live dasses mat arc ^ tible 
cation I is removed from IP (322) If an invocation identical ^ a vimial call instance . ^ computed by our example 
to I has already been added to I (323) then control returns implementation by finding the signature of i with its asso- 
to 321 an ? £e next invocation is considered. Otherwise, I is ciated class in me class hierarchy graph, and recursively 
added to I (324). 60 looking up the same signature in all of the derived classes 

The set T is computed, m a language-dependent manner, represented in the class hierarchy graph, 
to be all of the methods of live classes (classes in C) that are 

compatible with I (325). A method is compatible with a call Usin S the Information Computed by the Present 

site instance I if a dynamic dispatch at I could potentially be Method 

a call to that method. For Smalltalk, the compatibility 65 The invention described thus far computes various sets of 

computation is simple: it is the set of methods of the classes information regarding an object-oriented program. This 

C that have the same name and arity as the invocation information can be applied in a number of ways: 



03/20/2004, EAST Version: 1.4.1 



6,041,179 



11 



12 



1. Resolution of Dynamic Dispatch. A virtual function call 
can be converted to a direct function call when there is only 
one method in P, the set of live procedures, that is compat- 
ible with the function name and type information of the 
called function. If there is more than one compatible 5 
function, but only a small number, the call can be resolved 
by introducing explicit tests for the reduced number of 
possible target functions. 

2. Code Compaction. Code size can be reduced by only 
linking or including in the program image those procedures 10 
in P, the set of live procedures. All other procedures have 
been proven by the analysis performed by the algorithm to 

be dead, or impossible to execute. 

3. Programming Environment. A programming environ- 
ment can provide a view of the program in which only those 15 
procedures that are live (those in P) are displayed to the 
programmer, and only those procedures in P are listed or 
displayed as potential targets of virtual function calls. 
Classes not in C, the set of live classes, can also be elided. 
This elision of information will allow the programmer to 20 
concentrate on the functional parts of the program. 

While the invention has been described in terms of a 
single preferred embodiment, those skilled in the art will 
recognize that the invention can be practiced with modifi- 2J 
cation within the spirit and scope of the appended claims. 

Having thus described our invention, what we claim as 
new and desire to secure by Letters Patent is as follows: 

1. A computer-implemented method for reducing an 
object oriented program, comprising the steps of: 3Q 
identifying initial members of a set of live methods and 
procedures within said program, said initial members 
comprising methods and procedures that by definition 
in the program must be executed when the program is 
™n; 35 
identifying initial members of a set of live classes within 
said program, said initial members comprising classes 
that must be created when the program is run, based on 
said initial set of live methods and procedures; 
identifying an additional set of live methods and 40 
procedures, comprising the additional methods and 
procedures reachable from the live methods and pro- 
cedures within said set of live methods and procedures, 
and adding said additional set to said set of live 
methods and procedures; 



identifying an additional set of live classes, comprising 
the additional classes potentially creatable by said 
identified set of additional live methods and proce- 
dures; 

converting indirect methods invocations to direct method 
calls which do not perform a runtime type test or 
dynamic dispatch; and 

repeating said steps of identifying an additional set of live 
methods and procedures and of identifying an addi- 
tional set of live classes until no further methods or 
procedures are added to said set of live methods and 
procedures. 

2. A computer- implemented method for reducing an 
object oriented program, comprising the steps of: 

identifying initial members of a set of live methods and 
procedures within said program, said initial members 
comprising methods and procedures that by definition 
in the program must be executed when the program is 
run; 

identifying initial members of a set of live classes within 
said program, said initial members comprising classes 
that must be created when the program is run, based on 
said initial set of live methods and procedures; 

identifying an additional set of live methods and 
procedures, comprising the additional methods and 
procedures reachable from the live methods and pro- 
cedures within said set of live methods and procedures, 
and adding said additional set to said set of live 
methods and procedures; 

identifying an additional set of live classes, comprising 
the additional classes potentially creatable by said 
identified set of additional live methods and proce- 
dures; 

repeating said steps of identifying an additional set of live 
methods and procedures and of identifying an addi- 
tional set of live classes until no further methods or 
procedures are added to said set of live methods and 
procedures; and 

removing methods and procedures not identified as live. 



03/20/2004, EAST version: 1.4.1 



